A Simple, Fast, and Compact Static Dictionary

نویسندگان

  • Scott Schneider
  • Michael Spertus
چکیده

We present a new static dictionary that is very fast and compact, while also extremely easy to implement. A combination of properties make this algorithm very attractive for applications requiring large static dictionaries: 1. High performance, with membership queries taking O(1)-time with a near-optimal constant. 2. Continued high performance in external memory, with queries requiring only 1-2 disk seeks. If the dictionary has n items in {0, ..., m−1} and d is the number of bytes retrieved from disk on each read, then the average number of seeks is min ( 1.63, 1 +O (√ n log m d )) . 3. Efficient use of space, storing n items from a universe of size m in n logm− 1 2 n log n+O (n+ log logm) bits. We prove this space bound with a novel application of the Kolmogorov-Smirnov distribution. 4. Simplicity, with a 20-line pseudo-code construction algorithm and 4-line query algorithm.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Osteogenic Differentiation and Mineralization on Compact Multilayer nHA-PCL Electrospun Scaffolds in a Perfusion Bioreactor

Background: Monolayer electrospun scaffolds have already been used in bone tissue engineering due to their high surface-to-volume ratio, interconnectivity, similarity to natural bone extracellular matrix (ECM), and simple production. Objectives: The aim of this study was to evaluate the dynamic culture effect on osteogenic differentiation and mineralizationi into a compact cellular multilayer ...

متن کامل

Simple, compact and robust approximate string dictionary

This paper is concerned with practical implementations of approximate string dictionaries that allow edit errors. In this problem, we have as input a dictionary D of d strings of total length n over an alphabet of size σ. Given a bound k and a pattern x of length m, a query has to return all the strings of the dictionary which are at edit distance at most k from x, where the edit distance betwe...

متن کامل

Czech-English Machine Translation Dictionary

We are proposing a format for translation dictionaries suitable for machine translation. The dictionary format is concise and generalizes phrases by introducing rules for morphological generation instead of using simple phrase to phrase mapping. We describe a simple way how to automatically construct our compact entries from a machine-readable dictionary originally intended for human users usin...

متن کامل

Eecient Optimal Recompression

An eecient variant of an optimal algorithm is presented, which reorganizes data that has been compressed by some on-they compression method, into a more compact form, without changing the decoding procedure. The algorithm accelerates and improves the space requirements of a known technique based on a reduction to a graph-theoretic problem, by reducing the size of the graph, without aaecting the...

متن کامل

Ef"cient Optimal Recompression

An ef"cient variant of an optimal algorithm is presented, which reorganizes data that has been compressed by some on-the-#y compression method, into a more compact form, without changing the decoding procedure. The algorithm accelerates and improves the space requirements of a known technique based on a reduction to a graph-theoretic problem, by reducing the size of the graph, without affecting...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2009